Dirichlet Process Mixture Model

Start from finite mixture model...

Traditional representation:




Another representation (in which H and F are both Gaussian):




where parameters are sample from G, a discrete measure of H:
G(θ)=k=1Kπkδθk(θ)

and xiF(θ¯i).

DPMM




πGEM(α) is the stick-breaking construction:
βkπk=Beta(1,α)=βk(1l=1k1πl)

Another representation:




Where G is a random measure GDP(α,H):
G(θ)=k=1πkδθk(θ)

One can show that the GEM process will terminate with probability 1, which means samples from a DP are discrete with probability 1. As N, Kαlog(N), showing that the model complexity will indeed grow logarithmically with dataset size.

model fitting (Gibbs sampling)

p(zi=k|zi,x,α,λ)p(zi=k|zi,α) p(xi|xi,k,λ)

p(zi=k|zi,α)={Nk,iα+N1αα+N1if k has been seen beforeif k is a new cluster

p(xi|xi,k,λ)=p(xi,xi,k|λ)p(xi,k|λ)

where p(xi,xi,k|λ) is the marginal likelihood of all the data assigned to cluster k, including i, and p(xi,k|λ) is an analogous expression excluding i.

Example: DPMM algorithm for clustering:

Random initial assignment to clusters
loop
  unassign an observation
  choose new cluster for that observation
until convergence

Gibbs sampling for choosing cluster:

p(zi=kzi,x,α)=(NkN+α)(x,Nkx¯Nk+1,1)αN+α(x,0,1)existing cluster knew cluster

on the assumption that base distribution G is normal distribution with zero mean and unit variance. (x,μ,Σ) is the probability of generate x from a Gaussian with mean μ and variance Σ.

Reference

"Machine Learning" Lecture 17: http://www.umiacs.umd.edu/~jbg/teaching/CSCI_5622/
Book: Machine Learning - A Probabilistic Perspective(Chapter 25)